This documentation provides a detailed annotation of the AlexNet architecture, explaining the purpose and functionality of each layer, the role of hyperparameters, and how the network contributes to efficient feature extraction and classification tasks.
# Python libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Importing tensorflow libraries
import tensorflow as tf
from tensorflow import keras
# Graph Visualization
import pydot
import graphviz
import warnings
warnings.filterwarnings("ignore")
AlexNet is an image classification model that made a big impact on deep learning. It was developed by Geoffrey Hinton and his team in 2012, and it played a crucial role in demonstrating the power of CNN (Convolutional Neural Network) architectures for large-scale applications. Before AlexNet, people were skeptical about whether deep learning could be applied successfully to very large image datasets. However, a group of researchers were determined to show that deep neural networks were the way forward. Their success with AlexNet greatly increased interest in deep learning after 2012. Hence, AlexNet was born out of the need to improve the results of the ImageNet challenge. This was one of the first Deep convolutional networks to achieve considerable accuracy on the 2012 ImageNet LSVRC-2012 challenge.It did much better than the second-place model by lowering the top-5 error rate from 26.2% to 15.3%. The idea of spatial correlation in an image frame was explored using convolutional layers and receptive fields. Thus, ImageNet was also crucial for the advancement of Deep Learning.
Hence, AlexNet was born out of the need to improve the results of the ImageNet challenge. This was one of the first Deep convolutional networks to achieve considerable accuracy on the 2012 ImageNet LSVRC-2012 challenge.It did much better than the second-place model by lowering the top-5 error rate from 26.2% to 15.3%. The idea of spatial correlation in an image frame was explored using convolutional layers and receptive fields. Thus, ImageNet was also crucial for the advancement of Deep Learning.
Before AlexNet, computer vision was mostly dominated by techniques like Support Vector Machines (SVMs) and simple neural networks because training deeper models with millions of parameters seemed unachievable. Let's explore why this was the case and the challenges faced by earlier machine learning models.
Due to these challenges, such as limited computational power, the vanishing gradient issue, and the absence of large training datasets, most neural networks were shallow, limiting their ability to generalize well.
The primary motivation behind AlexNet and the research paper stems from the need to significantly improve the accuracy of image classification systems, particularly in the context of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Before AlexNet, traditional image classifiers struggled with high variability in images due to factors like different object positions, sizes, and environments which lead to poor generalization on large-scale image datasets. The ImageNet dataset, which includes over 15 million labeled high-resolution images in over 22,000 categories, which previous models based on shallow learning architectures and handcrafted features were not able to handle effectively.
The problems this paper sought to address included:
This AlexNet paper introduced several key innovations that enabled it to resolve these problems effectively, marking significant breakthroughs in the field of deep learning for image classification:
The problems this paper sought to address included:
These innovations combined made AlexNet particularly effective, enabling it to outperform other models by a significant margin in the ImageNet competition and lowering the top 5 error rate dramatically to 17.0% compared to previous state-of-the-art technologies, significantly outperforming the second-best entry at the time, which had an error rate of 26.2%. This breakthrough demonstrated the potential of deep convolutional neural networks in image recognition tasks, setting a new standard in the field and inspiring subsequent research and advancements.
ImageNet is a vast dataset with over 15 million high-resolution images categorized into approximately 22,000 categories. These images were sourced from the internet and labeled by people using Amazon’s Mechanical Turk. Since 2010, an annual competition known as the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been part of the Pascal Visual Object Challenge. The competition is significant as it encourages researchers to test the boundaries of current models and develop new approaches. Many pioneering models, including AlexNet, have emerged from this competition. ILSVRC uses a smaller portion of ImageNet, with about 1,000 images in each of 1,000 categories, totaling around 1.2 million training images, 50,000 validation images, and 150,000 testing images. The 2010 version of ILSVRC is unique because it’s the only version with available test set labels, which is why most experiments were performed on it.
In the ImageNet challenge, they adjusted ImageNet's variable-resolution images to a consistent size by down-sampling them to 256x256 pixels. They resized the images so the shorter side was 256 pixels, then cropped the central 256x256 patch. The only preprocessing done was subtracting the average activity over the training set from each pixel. Thus, the network was trained on the centered raw RGB values of the pixels.
AlexNet, a deep convolutional neural network to classify 1.2 million high-resolution images into 1000 different categories for the ImageNet LSVRC-2010 contest. This network comprises eight learned layers, including five convolutional and three fully-connected layers with 1000-way softmax, achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively in the contest. These rates significantly surpassed previous benchmarks. It has a network, with 60 million parameters and 650,000 neurons, used advanced techniques like non-saturating neurons, efficient GPU-based convolution operations, and a dropout regularization method to reduce overfitting. This model variant won the ILSVRC-2012 competition with a top-5 error rate of 15.3%.
AlexNet features a relatively straightforward structure when compared to more recent Deep Learning models. It includes eight layers in total: five convolutional layers and three fully connected layers. The model incorporates several pioneering features from its era such as ReLU activation functions, Local Response Normalization (LRN), and Overlapping Max Pooling.
AlexNet processes images with an input size of 224x224x3 RGB pixels.
To prevent overfitting, it's useful to expose the neural network to different versions of the same image. This not only helps generate more data but also encourages the network to learn and remember key features.
For example, if our training set includes an image of a cat, the network can also learn from the mirror image of that cat. By flipping the image horizontally, we effectively double the size of our training dataset.
Taking random crops from the original image can also create additional data variations. These variations are essentially the original data but shifted in position.
In AlexNet, random 224x224 crops from the 256x256 original images as inputs for the network. This method increased the dataset size by 2048 times.
import tensorflow as tf
def calculate_mean_std(dataset):
pixel_sum, pixel_square_sum, num_pixels = 0.0, 0.0, 0
for image, _ in dataset:
pixel_sum += tf.reduce_sum(image)
pixel_square_sum += tf.reduce_sum(tf.square(image))
num_pixels += tf.size(image)
mean = pixel_sum / tf.cast(num_pixels, tf.float32)
std_dev = tf.sqrt(
pixel_square_sum / tf.cast(num_pixels, tf.float32) - tf.square(mean)
)
return mean, std_dev
def normalize_image(image, mean, std_dev):
# Subtract mean and divide by standard deviation
return (image - mean) / std_dev
# Function to normalize image and label pairs
def normalize_image_dataset(image, label):
image = normalize_image(image, mean, std_dev)
return image, label
# Function to apply horizontal reflection
def horizontal_reflection(image, label):
image = tf.image.flip_left_right(image)
return image, label
# Function to adjust brightness
def adjusting_brightness(image, label):
image = tf.image.adjust_brightness(image, 0.2)
return image, label
def DataAugmentation(dataset):
# Applying Mean Normalization
dataset = dataset.map(normalize_image_dataset, num_parallel_calls=tf.data.AUTOTUNE)
# Applying Horizontal flip
dataset = dataset.map(horizontal_reflection, num_parallel_calls=tf.data.AUTOTUNE)
# Applying brightness adjustment
dataset = dataset.map(adjusting_brightness, num_parallel_calls=tf.data.AUTOTUNE)
return dataset
Here we calculate the mean and standard deviation of dataset, normalizes images based on these statistics, and applies data augmentation techniques like horizontal flipping and optional brightness adjustment. The transformations are applied to both training and validation datasets, enhancing generalization and efficiency during model training.
Another key aspect of AlexNet is its use of the ReLU (Rectified Linear Unit) nonlinearity, which marked a departure from the traditional tanh or sigmoid activation functions. It was not the first time in exploring alternatives to conventional neuron models in CNNs. For instance, Jarrett et al. [3] have demonstrated that the nonlinearity \( f(x) = | tanh(x) | \) is particularly effective when combined with their method of contrast normalization and local average pooling on the Caltech-101 dataset. ReLUs allow deep CNNs to train much faster than networks using saturating activations like tanh or sigmoid. This is demonstrated in a comparison where using ReLUs enabled AlexNet to achieve a 25% training error rate six times faster than using tanh, as tested on the CIFAR-10 dataset.
The ReLU function is defined as f(x) = max(0, x). This function does not saturate in the same way tanh does because its derivative remains constant (and non-zero) for positive inputs, which facilitates faster convergence during optimization. For negative values, although the derivative is zero, it generally does not affect training speed adversely because most neurons tend to take on positive values. This characteristic makes ReLUs superior to sigmoid functions as well, which also suffer from saturation issues.
import tensorflow as tf
# ReLU activation function
relu = tf.keras.activations.relu
# Example tensor from a numpy array
x = tf.constant([1.0, -1.0, -0.5, 0.32])
# Applying ReLU
relu_x = relu(x)
#Output = [1.0, 0.0, 0.0, 0.32]
The graphs of the tanh and ReLU activation functions illustrate the differences in their behavior. The tanh function saturates for very high or very low values of z, where its slope approaches zero. This property can hinder the speed of gradient descent because changes in the input produce minimal changes in the output. Conversely, the ReLU function maintains a constant, non-zero slope for positive values of z, which aids in faster convergence during optimization. For negative values, although the slope is zero, it doesn't generally slow down learning, as most neurons in a neural network activate positively. For these reasons, ReLU is often preferred over the sigmoid function, which similarly suffers from saturation.
Convolutional layers are the foundation of convolutional neural networks (CNNs), which excel at tasks like image and video recognition, as well as other data involving spatial patterns. Here’s an in-depth look at its key components:
The convolution operation involves sliding a filter (or kernel) over the input data, performing element-wise multiplication and summation. This process helps extract important features, such as edges, textures, and patterns.
Filters are small matrices (typically 3x3, 5x5, etc.) applied to the input data to detect specific features, such as edges, corners, or textures. Each filter learns to recognize different patterns, with its values being updated during the training process.
Stride refers to how the filter moves across the input data in terms of rows and columns. For example, with a stride of 1, the filter moves one pixel at a time both horizontally (across columns) and vertically (across rows). With a stride of 2, the filter moves two pixels at a time in both directions, which reduces the size of the output feature map.
Padding involves adding extra pixels, typically zeros, around the input data. This helps control the spatial dimensions of the output feature map and can preserve the size of the input after convolution. Padding is especially useful in deeper networks, where maintaining the input size is crucial for effective processing.
In convolutional layers, each filter (or kernel) slides across the input image, computing dot products between the filter values and the image pixel values in the area the filter covers. This produces an output feature map where each "pixel" represents a feature detected in that region.
__init__ method initializes the convolutional layer with the specified parameters: number of filters, kernel size, stride, and padding type.initialize_kernels method initializes the convolutional kernels with random values. These kernels are used to scan the input image and extract features. The number of input channels is required to determine the shape of the kernels.apply_padding method applies zero padding around the image if padding='same' is specified.forward method, the input image is padded if necessary, and then we calculate the output dimensions based on the stride and kernel size.We create a simulated RGB image of shape 227 x 227 x 3 to match the input size of AlexNet. We then initialize a Conv2D layer to represent the first layer of AlexNet, with 96 filters, an 11 x 11 kernel, a stride of 4, and padding to preserve the spatial dimensions. When we apply this layer to the image, the output shape is 55 x 55 x 96, which represents the downsampled feature maps for each of the 96 filters in this layer.
This code is a simplified version, focusing specifically on mimicking Conv2D functionality with its distinct input size, kernel, stride, and number of filters.
import numpy as np
class Conv2D:
def __init__(self, filters, kernel_size, stride=1, padding='valid'):
'''
Initializes the Conv2D layer.
Parameters:
- filters (int): Number of filters to apply.
- kernel_size (tuple): Size of the kernel (height, width).
- stride (int): Step size for moving the kernel over the input.
- padding (str): 'valid' (no padding) or 'same' (padding to keep spatial dimensions).
'''
self.filters = filters
self.kernel_size = kernel_size
self.stride = stride
self.padding = padding
self.kernels = None # To hold the kernels once initialized
def initialize_kernels(self, input_channels):
'''
Initializes the kernels (filters) for each input channel.
Parameters:
- input_channels (int): Number of channels in the input image.
This method randomly initializes the kernels with a normal distribution.
Each kernel has a shape of (filters, kernel_height, kernel_width, input_channels).
'''
kernel_height, kernel_width = self.kernel_size
# Randomly initialize kernels for each filter and input channel
self.kernels = np.random.randn(self.filters, kernel_height, kernel_width, input_channels)
def apply_padding(self, image, pad_h, pad_w):
'''
Pads the input image with zeros if necessary.
Parameters:
- image (np.array): The input image.
- pad_h (int): Height of padding.
- pad_w (int): Width of padding.
Returns:
- np.array: The padded image.
'''
return np.pad(image, ((pad_h, pad_h), (pad_w, pad_w), (0, 0)), mode='constant', constant_values=0)
def forward(self, image):
'''
Performs the convolution operation on the input image.
Parameters:
- image (np.array): Input image with shape (height, width, channels).
Returns:
- np.array: Output of the convolution operation with shape (output_height, output_width, filters).
'''
# Initialize the kernels if not done already
if self.kernels is None:
self.initialize_kernels(input_channels=image.shape[2])
# Image dimensions and kernel dimensions
image_height, image_width, image_channels = image.shape
kernel_height, kernel_width, _ = self.kernels.shape[1:]
# Calculate padding if needed
if self.padding == 'same':
pad_h = (kernel_height - 1) // 2
pad_w = (kernel_width - 1) // 2
image = self.apply_padding(image, pad_h, pad_w)
elif self.padding == 'valid':
pad_h, pad_w = 0, 0
# Calculate output dimensions
output_height = (image.shape[0] - kernel_height) // self.stride + 1
output_width = (image.shape[1] - kernel_width) // self.stride + 1
# Initialize the output array to store feature maps for each filter
result = np.zeros((output_height, output_width, self.filters))
# Perform convolution
for f in range(self.filters):
# Select the kernel for the current filter
kernel = self.kernels[f]
for y in range(output_height):
for x in range(output_width):
# Extract the region of interest for the current location and all channels
region = image[
y * self.stride : y * self.stride + kernel_height,
x * self.stride : x * self.stride + kernel_width,
:
]
# Apply the filter to the region and sum across channels
result[y, x, f] = np.sum(region * kernel)
return result
# Example usage
# Simulated RGB image with dimensions (227, 227, 3)
image = np.random.randn(227, 227,3)
# Instantiate the Conv2D layer with 96 filters, an 11x11 kernel size, stride of 4, and 'same' padding
conv_layer = Conv2D(filters=96, kernel_size=(11, 11), stride=4, padding='valid')
# Apply the convolution layer to the image
output = conv_layer.forward(image)
print("Image shape:\n", image.shape)
print("Output Shape of convolution:\n", output.shape)
Image shape: (227, 227, 3) Output Shape of convolution: (55, 55, 96)
Max pooling is a technique often incorporated into Convolutional Neural Network (CNN) architectures. Its primary function is to reduce the spatial dimensions of the feature maps created by filtering an image. This reduction in size decreases both the number of parameters and the computational complexity of the network, helping to streamline processing.
Max pooling serves multiple purposes, it helps prevent overfitting by abstracting the features of the input, reduces the number of parameters that the network needs to learn, and imparts a basic level of translation invariance to the model's internal representation. It achieves this by applying a maximum filter to typically non-overlapping sections of the feature map.
In AlexNet, it was chosen to use max pooling windows of size 3x3 with a stride of 2, allowing some overlap between windows. This configuration of max pooling contributed to a decrease in the top-1 error rate by 0.4% and the top-5 error rate by 0.3%. In contrast, using non-overlapping pooling windows of size 2x2 with a stride of 2 would result in the same output dimensions but without these improvements in error rates.
Overlapping Max Pooling: Max pooling layers are commonly used in Convolutional Neural Networks (CNNs) to down-sample the spatial dimensions (width and height) of input tensors while maintaining the same depth. Overlapping max pooling differs from standard max pooling in that the regions over which the maximum is calculated overlap. In AlexNet, this method involves using 3x3 windows with a stride of 2, allowing adjacent pooling areas to overlap. This overlapping pooling strategy helps enhance feature extraction and has been shown to reduce the top-1 and top-5 error rates by 0.4% and 0.3% respectively, compared to non-overlapping pooling with 2x2 windows and a stride of 2. Additionally, overlapping pooling contributes to better generalization and slight reduction in overfitting during the training process.
pool_size.'same'), we compute the required amount of padding for both height and width based on the pooling window size and apply zero-padding to the image.
MaxPooling2D layer with a 3x3 pooling window and stride of 2.
MaxPooling2D, which is highly optimized and runs on GPU.
This custom implementation can be used for learning purposes or as part of a larger neural network framework.
import numpy as np
class MaxPooling2D:
def __init__(self, pool_size=(2, 2), stride=None, padding='valid'):
"""
Initializes the MaxPooling2D layer.
Parameters:
- pool_size (tuple): Size of the pooling window (height, width).
- stride (int or tuple): Step size for moving the pooling window.
- padding (str): 'valid' (no padding) or 'same' (zero padding).
"""
self.pool_size = pool_size
self.stride = stride if stride is not None else pool_size # Default to pool_size if stride is None
self.padding = padding
def apply_padding(self, image, pad_h, pad_w):
"""
Pads the input image with zeros if necessary.
Parameters:
- image (np.array): The input image.
- pad_h (int): Padding height.
- pad_w (int): Padding width.
Returns:
- np.array: Padded image.
"""
return np.pad(image, ((pad_h, pad_h), (pad_w, pad_w), (0, 0)), mode='constant', constant_values=0)
def forward(self, image):
"""
Performs the max pooling operation.
Parameters:
- image (np.array): Input image with shape (height, width, channels).
Returns:
- np.array: The pooled image.
"""
# Image dimensions
image_height, image_width, channels = image.shape
pool_height, pool_width = self.pool_size
# Padding if necessary
if self.padding == 'same':
pad_h = (pool_height - 1) // 2
pad_w = (pool_width - 1) // 2
image = self.apply_padding(image, pad_h, pad_w)
elif self.padding == 'valid':
pad_h, pad_w = 0, 0
# Calculate output dimensions
output_height = (image.shape[0] - pool_height) // self.stride[0] + 1
output_width = (image.shape[1] - pool_width) // self.stride[1] + 1
# Initialize the output array
pooled_output = np.zeros((output_height, output_width, channels))
# Perform pooling operation
for h in range(output_height):
for w in range(output_width):
for c in range(channels):
# Extract the region of interest (ROI) for the current filter position
region = image[
h * self.stride[0]: h * self.stride[0] + pool_height,
w * self.stride[1]: w * self.stride[1] + pool_width,
c
]
# Apply max pooling to the region
pooled_output[h, w, c] = np.max(region)
return pooled_output
# Example usage
# Simulated image with batch size of 1, height=6, width=6, and 1 channel
image = np.random.randn(55, 55, 96)
# Instantiate the MaxPooling2D layer with pool_size=(2, 2), stride=(2, 2), and 'valid' padding
#max_pool = MaxPooling2D(pool_size=(2, 2), stride=(2, 2), padding='valid')
max_pool= MaxPooling2D(pool_size=(3,3), stride=(2,2), padding='valid')
# Apply the max pooling operation to the image
output = max_pool.forward(image)
print("Input Shape:\n",image.shape)
print("Output Shape of MaxPooling2D:\n", output.shape)
Input Shape: (55, 55, 96) Output Shape of MaxPooling2D: (27, 27, 96)
Fully Connected Layers ( FFN )
Fully connected (FC) layers are crucial components of Convolutional Neural Networks (CNNs), connecting every neuron in one layer to the next. Unlike convolutional and pooling layers, which collect spatial features, fully connected layers combine all previously learned features to create final predictions.
AlexNet's final layers are completely connected and incorporate high-level information obtained in previous convolutional layers. These layers enable complicated decision-making by combining all feature maps and assigning them to specified classes.
The softmax layer, often the final layer in CNNs for classification tasks, converts output values from the previous layer into probabilities for each class. It’s defined by the formula:
where zi represents the input score for each class.
In AlexNet, the softmax layer assigns probabilities to each class, allowing the network to make a clear prediction by identifying the class with the highest probability. This final step turns high-level features into an interpretable output, guiding the model to its classification decision.
__init__ method sets up the Dense layer with the given number of input and output features, preparing the layer for use.initialize_params method initializes weights with random values scaled for balanced activations and sets biases to zero.forward method computes the output by performing a dot product of the input with the weights and adding the biases.This concise implementation illustrates the fundamental workings of a Dense layer, ideal for learning and understanding the basics.
class Dense:
def __init__(self, in_features, out_features):
"""
Initializes the Dense (fully connected) layer.
Parameters:
- input_size (int): Number of input features.
- output_size (int): Number of output features.
"""
self.input_size = in_features
self.output_size = out_features
self.weights = None
self.biases = None
self.initialize_params()
def initialize_params(self):
"""
Initializes weights and biases with random values.
Weights are initialized using a normal distribution.
Biases are initialized to zeros.
"""
self.weights = np.random.randn(self.input_size, self.output_size) * np.sqrt(2. / self.input_size)
self.biases = np.zeros((1, self.output_size))
def forward(self, input_data):
"""
Performs the forward pass of the Dense layer.
Parameters:
- input_data (np.array): Input data with shape (batch_size, input_size).
Returns:
- np.array: Output of the dense layer with shape (batch_size, output_size).
"""
return np.dot(input_data, self.weights) + self.biases
Local Response Normalization (LRN): Local Response Normalization (LRN) is a technique used to normalize the activations of neurons in a neural network by considering the activities of neighboring neurons within the same layer. Typically applied after the ReLU activation, LRN encourages competition among neurons, preventing any single neuron from dominating and promoting more balanced feature learning. This regularization helps improve generalization, making the network less likely to overfit and better able to handle unseen data. In practice, LRN has been shown to boost performance, as seen in AlexNet, where it contributed to improved accuracy and reduced error rates. LRN is computationally intensive, so it is only applied to the first two convolutional layers in AlexNet to avoid a significant performance cost in deeper layers.
The formula for LRN is shown below:
where:
To better understand Local Response Normalization (LRN), let’s consider an example: Suppose we have 3 feature maps (or channels) from a convolutional layer, each containing 4x4 activations at a certain spatial position in an image. We’ll focus on the same position across all feature maps, as LRN normalizes a neuron's value by considering the activations of its neighbors in other feature maps at the same spatial location.
Improvements: In AlexNet, applying LRN resulted in:
These improvements indicate better generalization and accuracy on the validation set due to the normalization effect.
import numpy as np
class custom_LocalResponseNormalization:
def __init__(self, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75):
"""
Initializes the LocalResponseNormalization layer.
Parameters:
- depth_radius (int): Number of neighboring channels for normalization.
- bias (float): Added to the normalization term to avoid division by zero.
- alpha (float): Scaling parameter.
- beta (float): Exponent for normalization.
"""
self.depth_radius = depth_radius
self.bias = bias
self.alpha = alpha
self.beta = beta
def forward(self, X):
"""
Applies local response normalization to the input data.
Parameters:
- X (np.array): Input data with shape (height, width, channels).
Returns:
- np.array: The normalized data.
"""
H, W, C = X.shape # height, width, channels
squared_input = np.square(X)
output = np.zeros_like(X)
for h in range(H): # Iterate over height
for w in range(W): # Iterate over width
for c in range(C): # Iterate over channels
# Define the start and end of the depth radius range
start = max(0, c - self.depth_radius // 2)
end = min(C, c + self.depth_radius // 2 + 1)
# Sum the square of inputs within the local region
local_sum = np.sum(squared_input[h, w, start:end], axis=0)
# Apply LRN formula
scale = self.bias + (self.alpha * local_sum)
output[h, w, c] = X[h, w, c] / (scale ** self.beta)
return output
With around 60 million parameters to train, the creators of AlexNet looked for extra ways to avoid overfitting. One method they used was the dropout technique, introduced by G.E. Hinton in 2012 [1]. In dropout, there's a 50% chance that each neuron is temporarily left out during training, which means it doesn't affect the learning process. This variation ensures that every input is handled by a slightly different configuration of the network, which makes the network's learning more effective and less likely to overfit. Dropout isn't used during testing; the whole network is active, and the outputs are halved to balance out the effect of the dropout during training. While dropout does make training take twice as long, it significantly helps prevent AlexNet from overfitting.
In AlexNet, minimal pre-processing is applied to the input images: each pixel’s value is adjusted by subtracting the mean pixel activity across the entire training set, centering the pixel values and aiding model training. While the standard input image size for AlexNet is generally 224x224x3 (height, width, and color channels), an input size of 227x227x3 is used in calculations to simplify the output feature map dimensions in the first convolutional layer.
In convolutional layers, each filter (or kernel) slides across the input image, computing dot products between the filter values and the image pixel values in the area the filter covers. This produces an output feature map where each "pixel" represents a feature detected in that region.
The formula for calculating the output dimension (Wout, Hout) of a convolutional layer:
Wout = ((Win + 2P - K) / S) + 1
Hout = ((Hin + 2P - K) / S) + 1
Plugging in AlexNet’s values:
Wout = ((227 + 2*0 - 11) / 4) + 1 = (216 / 4) + 1 = 55
Hout = ((227 + 2*0 - 11) / 4) + 1 = (216 / 4) + 1 = 55
Output feature map size: 55 x 55 x 96
After each convolution in AlexNet, a ReLU activation function is applied element-wise, setting negative values to zero while keeping positive values unchanged. In this layer, ReLU is followed by Local Response Normalization (LRN), which normalizes activations across neighboring feature maps to improve generalization and reduce overfitting.
The input to this layer is the output from the first convolutional layer, which has dimensions 55 x 55 x 96.
Plugging in values:
Wout = ((55 + 2*0 - 3) / 2) + 1 = 27
Hout = ((55 + 2*0 - 3) / 2) + 1 = 27
Output feature map size: 27 x 27 x 96
Plugging in values:
Wout = ((27 + 2*2 - 5) / 1) + 1 = 27
Hout = ((27 + 2*2 - 5) / 1) + 1 = 27
Output feature map size: 27 x 27 x 256
Similar to the first convolutional layer, a ReLU activation function is applied element-wise. Following this, Local Response Normalization (LRN) is applied to normalize activations across neighboring feature maps.
Plugging in values:
Wout = ((27 + 2*0 - 3) / 2) + 1 = 13
Hout = ((27 + 2*0 - 3) / 2) + 1 = 13
This gives us an output feature map size of 13 x 13 for each filter. Since there are 256 filters in the second Max Pooling Layer the output shape of the second pooling layer is 13 x 13 x 256.
Similar to the first pooling layer, no activation function is applied here, as pooling’s role is primarily down-sampling. Local Response Normalization (LRN) is not used after this pooling layer, as it was only applied in specific layers earlier in the AlexNet architecture.
Plugging in values:
Wout = ((13 + 2*1 - 3) / 1) + 1 = 13
Hout = ((13 + 2*1 - 3) / 1) + 1 = 13
This results in an output feature map size of 13 x 13 for each filter. Since there are 384 filters in this layer, the output shape of the third convolutional layer is 13 x 13 x 384.
In this layer, the ReLU activation function is applied after convolution, while Local Response Normalization (LRN) is not utilized.
The fourth convolutional layer in AlexNet takes the output feature map from the third convolutional layer as its input. The parameters for this layer are:
In this layer, each of the 384 filters (3 x 3 x 384) slides over the feature map produced by the third convolutional layer (13 x 13 x 384). With a padding of 1, the spatial dimensions remain the same as the input, ensuring that the convolution covers the full feature map without any reduction in size.
Wout = ((Win + 2P - K) / S) + 1
Hout = ((Hin + 2P - K) / S) + 1
Plugging in values from the previous layer:
Wout = ((13 + 2*1 - 3) / 1) + 1 = (12) + 1 = 13
Hout = ((13 + 2*1 - 3) / 1) + 1 = (12) + 1 = 13
This results in an output feature map size of 13 x 13 for each filter.
With 384 filters, the output shape of the fourth convolutional layer is 13 x 13 x 384.
Similar to the last layer, the ReLU activation function is applied after convolution, while Local Response Normalization (LRN) is not utilized.
The fifth convolutional layer in AlexNet takes the output feature map from the fourth convolutional layer as input. The parameters for this layer are:
In this layer, each of the 256 filters (3 x 3 x 384) slides over the feature map produced by the fourth convolutional layer (13 x 13 x 384). The padding of 1 keeps the spatial dimensions of the output feature map the same as the input, so the convolution covers the entire input feature map without reducing its size.
Wout = ((Win + 2P - K) / S) + 1
Hout = ((Hin + 2P - K) / S) + 1
Plugging in values from the previous layer:
Wout = ((13 + 2*1 - 3) / 1) + 1 = (12) + 1 = 13
Hout = ((13 + 2*1 - 3) / 1) + 1 = (12) + 1 = 13
This results in an output feature map size of 13 x 13 for each filter.
Since there are 256 filters in this layer, the output shape of the fifth convolutional layer is 13 x 13 x 256.
Similar to the last two layers, the ReLU activation function is applied after convolution, while Local Response Normalization (LRN) is not utilized.
The third pooling layer in AlexNet performs max pooling on the feature map produced by the fifth convolutional layer to further reduce spatial dimensions while retaining essential features. The parameters for this pooling layer are:
This layer applies a 3 x 3 max pooling filter to the input feature map (13 x 13 x 256) with a stride of 2. The filter slides over the input, selecting the maximum value from each 3 x 3 region, which reduces the spatial dimensions while preserving important features.
Wout = ((Win + 2P - K) / S) + 1
Hout = ((Hin + 2P - K) / S) + 1
Plugging in values from the previous layer:
Wout = ((13 + 2*0 - 3) / 2) + 1 = (5) + 1 = 6
Hout = ((13 + 2*0 - 3) / 2) + 1 = (5) + 1 = 6
This results in an output feature map size of 6 x 6 for each filter.
Since the number of channels (256) is preserved, the output shape of the third pooling layer is 6 x 6 x 256.
In this layer, no activation function or Local Response Normalization (LRN) is applied. The pooling layer’s sole purpose is to down-sample the feature map.
The next fully connected layers in AlexNet take the flattened output from the final pooling layer (6 x 6 x 256) and transform it into a 1-dimensional vector. These layers are designed to perform high-level reasoning on the features extracted by the convolutional and pooling layers. The first fully connected layer has 4096 neurons, followed by another fully connected layer with 4096 neurons, and finally, a fully connected layer with 1000 neurons corresponding to the number of classes in the dataset. Each of these layers uses the ReLU activation function to introduce non-linearity.
Computing similarity using Euclidean distance between two 4096-dimensional vectors is inefficient; however, efficiency can be improved by training an auto-encoder to compress these vectors into short binary codes. This approach would enhance image retrieval methods significantly compared to using autoencoders directly on raw pixels, which often fail to retrieve semantically similar images due to their focus on edge patterns alone [2].
To combat overfitting, the authors of AlexNet incorporated the dropout technique. In this method, each neuron has a 50% chance of being temporarily removed from the network during training. This means it won't contribute to either the forward or backward propagation, ensuring that each input is processed by a slightly different network configuration. This makes the learned weight parameters more robust and less prone to overfitting.
The final layer in the AlexNet architecture is a fully connected output layer, denoted as "y," which includes 1000 possible outputs using the softmax activation function. The network combines two fully connected layers, each with 2048 neurons, into a single fully connected layer with 4096 neurons. This connects directly to the output layer. The softmax function ensures that the sum of all output layer values ranges between 0 and 1. AlexNet uses a subset of the ImageNet dataset, featuring 1000 output neurons corresponding to classes labeled 0 through 999. To visualize this process, refer to the diagram provided below.
This sample code execution defines and runs the forward pass of an AlexNet-inspired model. It processes an input image of the same size as AlexNet (227x227x3) through multiple convolutional and pooling layers, followed by fully connected layers to generate class predictions. The model combines feature extraction and classification steps to output a probability distribution over 1000 classes.
from tensorflow.keras import layers
from tensorflow.keras.models import Model
import tensorflow as tf
# Apply ReLU activation in TensorFlow
class AlexNetModel:
def __init__(self, input_shape=(224,224,3)):
self.input_shape = input_shape
#self.model = self.build_model()
def forward(self,X):
#First Conv Layer
conv1_layer = Conv2D(filters=96, kernel_size=(11, 11), stride=4, padding='valid')
X=conv1_layer.forward(X)
X=tf.nn.relu(X)
lrn_custom = custom_LocalResponseNormalization()
X = lrn_custom.forward(X)
#First Max Pooling Layer
max_pool= MaxPooling2D(pool_size=(3,3), stride=(2,2), padding='valid')
X=max_pool.forward(X)
#Second Conv Layer
conv2_layer = Conv2D(filters=256, kernel_size=(5, 5), stride=1, padding='same')
X=conv2_layer.forward(X)
X=tf.nn.relu(X)
lrn_custom = custom_LocalResponseNormalization()
X = lrn_custom.forward(X)
#Second Max Pooling Layer
max_pool = MaxPooling2D(pool_size=(3, 3), stride=(2, 2), padding='valid')
X=max_pool.forward(X)
#Third Conv Layer
conv3_layer = Conv2D(filters=384, kernel_size=(3, 3), stride=1, padding='same')
X=conv3_layer.forward(X)
X=tf.nn.relu(X)
#Fourth Conv Layer
conv4_layer = Conv2D(filters=384, kernel_size=(3, 3), stride=1, padding='same')
X=conv4_layer.forward(X)
X=tf.nn.relu(X)
#Fifth Conv Layer
conv5_layer = Conv2D(filters=256, kernel_size=(3, 3), stride=1, padding='same')
X=conv5_layer.forward(X)
X=tf.nn.relu(X)
#third Max Pooling Layer
max_pool = MaxPooling2D(pool_size=(3, 3), stride=(2, 2), padding='valid')
X=max_pool.forward(X)
X=X.flatten() #flattening to make it compatible for fully connected layers
#fully connected layer1
X_fc1=Dense(in_features=9216,out_features=4096)
X=X_fc1.forward(X)
X=tf.nn.relu(X)
X = tf.nn.dropout(X, 0.5)
#fully connected layer2
X_fc2=Dense(in_features=4096,out_features=4096)
X=X_fc2.forward(X)
X=tf.nn.relu(X)
X = tf.nn.dropout(X, 0.5)
#output (softmax) layer
X_fc3=Dense(in_features=4096,out_features=1000)
y=X_fc3.forward(X)
y=tf.nn.softmax(y)
return y
def summary(self):
return self.model.summary()
# Instantiate the AlexNet model class and print the summary
X=np.random.randn(227,227,3)
alexnet = AlexNetModel()
y=alexnet.forward(X)
y.shape
#alexnet.summary()
TensorShape([1, 1000])
We have used a batch size of 2, meaning that the model will process two samples at a time before updating its weights. This small batch size results in more frequent updates, potentially leading to a more dynamic learning process. Small batch sizes can help the model capture fine-grained patterns and avoid local minima but may also introduce more noise in updates.
Iterability: The batch size is set to be adjustable, allowing for modifications based on memory constraints or performance requirements. Small batch sizes like 2 can be effective when working with limited memory or large models, though they may increase training time due to the larger number of iterations per epoch.
def compile_and_train_alexnet(model, initial_lr=0.01, decay=0.0005, patience=10, min_lr=1e-6):
"""
Compiles and trains an AlexNet model using SGD optimizer with a ReduceLROnPlateau learning rate scheduler.
Parameters:
- model: The uncompiled AlexNet model.
- initial_lr (float): Starting learning rate for the optimizer.
- decay (float): Weight decay parameter for regularization.
- patience (int): Number of epochs with no improvement after which learning rate will be reduced.
- min_lr (float): Minimum learning rate for the scheduler.
Returns:
- model: The compiled model ready for training.
"""
# Define the SGD optimizer with the specified initial learning rate and weight decay
sgd_optimizer = tf.keras.optimizers.SGD(
learning_rate=initial_lr,
weight_decay=decay,
nesterov=False # Nesterov momentum is not used
)
# Compile the model with the defined optimizer, categorical crossentropy loss, and accuracy metric
model.compile(
optimizer=sgd_optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"]
)
# Set up a ReduceLROnPlateau callback to reduce the learning rate by a factor of 10 when validation loss plateaus
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss", # Monitor the validation loss to determine when to adjust learning rate
factor=0.1, # Reduce learning rate by a factor of 10
patience=patience, # Number of epochs with no improvement before reducing the learning rate
min_lr=min_lr # Set a floor for the learning rate
)
# Display model architecture
model.summary()
return model, lr_scheduler
SGD Optimizer and Learning Rate Scheduler
The training configuration starts with an SGD (Stochastic Gradient Descent) optimizer, a foundational optimization technique for AlexNet, originally used to balance efficient learning with robust convergence:
The AlexNet model is compiled with settings specifically suited for multi-class image classification:
categorical_crossentropy, a standard for multi-class classification, ensures the model learns by minimizing the difference between predicted and true class probabilities.accuracy is selected as a metric to directly monitor the model’s classification performance, enabling clear tracking of how well AlexNet learns to distinguish between image classes.
A ReduceLROnPlateau callback is set up, ensuring that the learning rate adapts based on the model’s validation performance (val_loss), which is crucial for AlexNet’s convergence:
patience=10), the learning rate is reduced by a factor of 0.1. This gradual decrease is essential for AlexNet’s large network structure, helping the model settle into a solution by slowing down learning when further improvements are harder to achieve.1e-6. This minimum limit is crucial in AlexNet to prevent overly small learning rates, which can cause the model to stop improving.
The function returns the compiled AlexNet model along with a learning rate scheduler ready for use in training (via model.fit). This setup supports AlexNet’s need for dynamic learning rate adjustments, aiding in smooth and efficient convergence. The scheduler’s design reduces the learning rate when necessary, allowing AlexNet to fine-tune its parameters as it progresses, resulting in improved model performance and stability.
def train_model_with_callbacks(model, train_data, valid_data, checkpoint_dir, lr_scheduler, patience=10, epochs=50):
"""
Train the model with callbacks for checkpointing, early stopping, and learning rate reduction.
Args:
model: The model to be trained.
train_data: The training dataset.
valid_data: The validation dataset.
checkpoint_dir: Directory where the model checkpoints will be saved.
lr_scheduler: Learning rate scheduler callback.
patience: Patience for early stopping, default is 10.
epochs: Number of epochs to train the model, default is 50.
Returns:
model_fit: The result of model training (fit history).
"""
# Define the checkpoint path with epoch in the filename
checkpoint_path = os.path.join(checkpoint_dir, "cp-{epoch:04d}.weights.h5") # creates the checkpoint at epoch level
# Callbacks for Checkpointing
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1, save_weights_only=True
)
# Callbacks for EarlyStopping
es = tf.keras.callbacks.EarlyStopping(patience=patience, monitor="loss")
# Train the model
model_fit = model.fit(
train_data,
validation_data=valid_data,
epochs=epochs,
callbacks=[es, cp_callback, lr_scheduler]
)
return model_fit
The train_model_with_callbacks function is designed to train a given model using the specified training and validation datasets while incorporating various callbacks to enhance the training process. This function integrates ModelCheckpoint for saving the model’s weights at different epochs, EarlyStopping for preventing overfitting by halting training when no improvement is observed, and LearningRateScheduler for adjusting the learning rate based on the model's performance.
model_fit (tf.keras.callbacks.History): The result of the model.fit() call, which contains the training history, including the loss and accuracy metrics recorded during training. This object can be used for further analysis or plotting the training progress.
The function creates a checkpoint path using the provided checkpoint_dir and saves the model’s weights at each epoch (using the format cp-{epoch:04d}.weights.h5). The ModelCheckpoint callback is used to periodically save the model weights during training, ensuring that progress is not lost.
The EarlyStopping callback monitors the training loss and halts training if there is no improvement in the loss for a specified number of epochs (patience). This prevents unnecessary training and potential overfitting.
The learning rate is adjusted dynamically using the provided lr_scheduler callback, based on the model's performance on the validation set. This helps in optimizing the learning process by reducing the learning rate when needed.
The model.fit() function is called with the training and validation data, specifying the number of epochs, and the callbacks. The training process is carried out with real-time monitoring and adjustments based on the callbacks.
We can begin by trying out a sample task using mini Imagenet dataset contain 16 classes. The following categories are considered for the analysis: ['tench', 'goldfish', 'great_white_shark', 'tiger_shark', 'hammerhead', 'electric_ray','stingray', 'cock', 'hen', 'ostrich', 'brambling', 'goldfinch','house_finch', 'junco', 'indigo_bunting','robin']
Data Loading - Mini ImageNet of 16 Classes
We will be loading the data using tf.keras.preprocessing.image_dataset_from_directory
In the paper, the downsampling technique used resizes the images so that the shortest side is scaled to 224 pixels while maintaining the aspect ratio. Here’s a step-by-step outline for implementing this:
img_height = 224
img_width = 224
batch_size = 2 # '2' because we have a small set of samples
df_train = tf.keras.preprocessing.image_dataset_from_directory(
"Imagenet_mini/train",
labels="inferred", # Automatically labels are picked based on the sub_directories name
label_mode="categorical", # labels are encoded as a categorical vector(for categorical loss )
# class_names = ['tench', 'goldfish', 'great_white_shark', 'tiger_shark', 'hammerhead', 'electric_ray',
# 'stingray', 'cock', 'hen', 'ostrich', 'brambling', 'goldfinch','house_finch', 'junco', 'indigo_bunting',
# 'robin'], # This is the explicit list of class names (must match names of subdirectories)
color_mode="rgb",
batch_size=batch_size,
image_size=(
img_height,
img_width,
), # Size to resize images to after they are read from disk,
shuffle=True,
seed=123,
validation_split=0.1,
subset="training",
)
Found 513 files belonging to 16 classes. Using 462 files for training.
df_valid = tf.keras.preprocessing.image_dataset_from_directory(
"Imagenet_mini/train",
labels="inferred", # Automatically labels are picked based on the sub_directories name
label_mode="categorical", # labels are encoded as a categorical vector(for categorical loss )
# class_names = ['tench', 'goldfish', 'great_white_shark', 'tiger_shark', 'hammerhead', 'electric_ray',
# 'stingray', 'cock', 'hen', 'ostrich', 'brambling', 'goldfinch','house_finch', 'junco', 'indigo_bunting',
# 'robin'], # This is the explicit list of class names (must match names of subdirectories)
color_mode="rgb",
batch_size=batch_size,
image_size=(
img_height,
img_width,
), # Size to resize images to after they are read from disk,
shuffle=True,
seed=123,
validation_split=0.1,
subset="validation",
)
Found 513 files belonging to 16 classes. Using 51 files for validation.
plt.figure(figsize=(10, 10))
for images, labels in df_train.take(1):
for i in range(2):
ax = plt.subplot(2, 2, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title("Before Data Augmentation") # Add this line
plt.axis("off")
plt.show()
2024-11-17 19:21:43.036463: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
def calculate_mean_std(dataset):
pixel_sum, pixel_square_sum, num_pixels = 0.0, 0.0, 0
for image, _ in dataset:
pixel_sum += tf.reduce_sum(image)
pixel_square_sum += tf.reduce_sum(tf.square(image))
num_pixels += tf.size(image)
mean = pixel_sum / tf.cast(num_pixels, tf.float32)
std_dev = tf.sqrt(
pixel_square_sum / tf.cast(num_pixels, tf.float32) - tf.square(mean)
)
return mean, std_dev
def normalize_image(image, mean, std_dev):
# Subtract mean and divide by standard deviation
return image - mean
# Calculate mean and standard deviation from the training dataset
mean, std_dev = calculate_mean_std(df_train)
# Function to normalize image
def normalize_image_dataset(image, label):
image = normalize_image(image, mean, std_dev)
return image, label
2024-11-17 19:20:31.038529: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
def horizontal_reflection(image, label):
image = tf.image.flip_left_right(image)
return image, label
def adjusting_brightness(image, label):
image = tf.image.adjust_brightness(image, 0.2)
return image, label
def DataAugmentation(dataset):
# Applying Mean Normalization
dataset = dataset.map(normalize_image_dataset, num_parallel_calls=tf.data.AUTOTUNE)
# Applying Horizontal flip
dataset = dataset.map(horizontal_reflection, num_parallel_calls=tf.data.AUTOTUNE)
# Applying Adjusting Brightness
# dataset = dataset.map(adjusting_brightness, num_parallel_calls = tf.data.AUTOTUNE)
return dataset
normalized_train_df = DataAugmentation(df_train)
# Optimizing Purpose
normalized_train_df = normalized_train_df.prefetch(tf.data.AUTOTUNE)
normalized_valid_df = DataAugmentation(df_valid)
# Optimizing Purpose
normalized_valid_df = normalized_valid_df.prefetch(tf.data.AUTOTUNE)
plt.figure(figsize=(10, 10))
for images, labels in normalized_train_df.take(1):
for i in range(2):
ax = plt.subplot(2, 2, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title("After Data Augmentation")
plt.axis("off")
2024-11-17 19:21:49.782143: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
model = tf.keras.models.Sequential(
[
# 1st conv
tf.keras.layers.Conv2D(
96, (11, 11), strides=(4, 4), activation="relu", input_shape=(224, 224, 3)
),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(3, strides=(2, 2)), # Overlap Pooling layer
# 2nd conv
tf.keras.layers.Conv2D(
256, (5, 5), strides=(1, 1), activation="relu", padding="same"
),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(3, strides=(2, 2)), # Overlap Pooling layer
# 3rd conv
tf.keras.layers.Conv2D(
384, (3, 3), strides=(1, 1), activation="relu", padding="same"
),
tf.keras.layers.BatchNormalization(),
# 4th conv
tf.keras.layers.Conv2D(
384, (3, 3), strides=(1, 1), activation="relu", padding="same"
),
tf.keras.layers.BatchNormalization(),
# 5th Conv
tf.keras.layers.Conv2D(
256, (3, 3), strides=(1, 1), activation="relu", padding="same"
),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(3, strides=(2, 2)), # Overlap Pooling layer
# To Flatten layer
tf.keras.layers.Flatten(),
# To FC layer 1
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dropout(0.5),
# To FC layer 2
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(16, activation="softmax"),
]
)
Model Setup and Training Configuration
As per the paper, we are configuring a Alexnet model, by defining the SGD optimizer with an initial learning rate of 0.01, supplemented by a weight decay of 0.0005 for regularization to prevent overfitting. The model is compiled using categorical cross-entropy as the loss function, suitable for multi-class problems, and accuracy as the performance metric. To improve convergence, we implement a ReduceLROnPlateau callback that dynamically reduces the learning rate by a factor of 0.1 if validation loss does not improve for 10 consecutive epochs, ensuring that training remains effective without excessively low learning rates. Finally, the model.summary() function provides a detailed architectural overview, listing each layer’s type, output shape, and parameter count, which is valuable for documentation and model assessment. This setup supports adaptive learning, regularization, and consistent monitoring, forming a robust foundation for model training and evaluation.
def compile_and_train_alexnet(model, initial_lr=0.01, decay=0.0005, patience=10, min_lr=1e-6):
"""
Compiles and trains an AlexNet model using SGD optimizer with a ReduceLROnPlateau learning rate scheduler.
Parameters:
- model: The uncompiled AlexNet model.
- initial_lr (float): Starting learning rate for the optimizer.
- decay (float): Weight decay parameter for regularization.
- patience (int): Number of epochs with no improvement after which learning rate will be reduced.
- min_lr (float): Minimum learning rate for the scheduler.
Returns:
- model: The compiled model ready for training.
"""
# Define the SGD optimizer with the specified initial learning rate and weight decay
sgd_optimizer = tf.keras.optimizers.SGD(
learning_rate=initial_lr,
weight_decay=decay,
nesterov=False # Nesterov momentum is not used
)
# Compile the model with the defined optimizer, categorical crossentropy loss, and accuracy metric
model.compile(
optimizer=sgd_optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"]
)
# Set up a ReduceLROnPlateau callback to reduce the learning rate by a factor of 10 when validation loss plateaus
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss", # Monitor the validation loss to determine when to adjust learning rate
factor=0.1, # Reduce learning rate by a factor of 10
patience=patience, # Number of epochs with no improvement before reducing the learning rate
min_lr=min_lr # Set a floor for the learning rate
)
# Display model architecture
model.summary()
return model, lr_scheduler
model, lr_scheduler = compile_and_train_alexnet(model)
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_5 (Conv2D) │ (None, 54, 54, 96) │ 34,944 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_5 │ (None, 54, 54, 96) │ 384 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_3 (MaxPooling2D) │ (None, 26, 26, 96) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_6 (Conv2D) │ (None, 26, 26, 256) │ 614,656 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_6 │ (None, 26, 26, 256) │ 1,024 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_4 (MaxPooling2D) │ (None, 12, 12, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_7 (Conv2D) │ (None, 12, 12, 384) │ 885,120 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_7 │ (None, 12, 12, 384) │ 1,536 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_8 (Conv2D) │ (None, 12, 12, 384) │ 1,327,488 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_8 │ (None, 12, 12, 384) │ 1,536 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_9 (Conv2D) │ (None, 12, 12, 256) │ 884,992 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_9 │ (None, 12, 12, 256) │ 1,024 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_5 (MaxPooling2D) │ (None, 5, 5, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_1 (Flatten) │ (None, 6400) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 4096) │ 26,218,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_2 (Dropout) │ (None, 4096) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 4096) │ 16,781,312 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_3 (Dropout) │ (None, 4096) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 16) │ 65,552 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 46,818,064 (178.60 MB)
Trainable params: 46,815,312 (178.59 MB)
Non-trainable params: 2,752 (10.75 KB)
def train_model_with_callbacks(model, train_data, valid_data, checkpoint_dir, lr_scheduler, patience=10, epochs=50):
"""
Train the model with callbacks for checkpointing, early stopping, and learning rate reduction.
Args:
model: The model to be trained.
train_data: The training dataset.
valid_data: The validation dataset.
checkpoint_dir: Directory where the model checkpoints will be saved.
lr_scheduler: Learning rate scheduler callback.
patience: Patience for early stopping, default is 10.
epochs: Number of epochs to train the model, default is 50.
Returns:
model_fit: The result of model training (fit history).
"""
# Define the checkpoint path with epoch in the filename
checkpoint_path = os.path.join(checkpoint_dir, "cp-{epoch:04d}.weights.h5") # creates the checkpoint at epoch level
# Callbacks for Checkpointing
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1, save_weights_only=True
)
# Callbacks for EarlyStopping
es = tf.keras.callbacks.EarlyStopping(patience=patience, monitor="loss")
# Train the model
model_fit = model.fit(
train_data,
validation_data=valid_data,
epochs=epochs,
callbacks=[es, cp_callback, lr_scheduler]
)
return model_fit
model_fit_1 = train_model_with_callbacks(model= model,
train_data= normalized_train_df,
valid_data= normalized_valid_df,
checkpoint_dir="model_training_excAdjBrig_incWeiDec",
lr_scheduler= lr_scheduler,
patience=10,
epochs=50)
Epoch 1/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 77ms/step - accuracy: 0.0680 - loss: 30.1693 Epoch 1: saving model to model_training_excAdjBrig_incWeiDec/cp-0001.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 80ms/step - accuracy: 0.0680 - loss: 30.1090 - val_accuracy: 0.1176 - val_loss: 4.3790 - learning_rate: 0.0100 Epoch 2/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 77ms/step - accuracy: 0.0844 - loss: 5.3418 Epoch 2: saving model to model_training_excAdjBrig_incWeiDec/cp-0002.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 18s 80ms/step - accuracy: 0.0844 - loss: 5.3394 - val_accuracy: 0.0588 - val_loss: 3.5982 - learning_rate: 0.0100 Epoch 3/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.0961 - loss: 3.8792 Epoch 3: saving model to model_training_excAdjBrig_incWeiDec/cp-0003.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.0960 - loss: 3.8785 - val_accuracy: 0.0784 - val_loss: 3.3251 - learning_rate: 0.0100 Epoch 4/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.1075 - loss: 3.7405 Epoch 4: saving model to model_training_excAdjBrig_incWeiDec/cp-0004.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 85ms/step - accuracy: 0.1075 - loss: 3.7391 - val_accuracy: 0.1373 - val_loss: 2.8175 - learning_rate: 0.0100 Epoch 5/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.1332 - loss: 3.2735 Epoch 5: saving model to model_training_excAdjBrig_incWeiDec/cp-0005.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.1333 - loss: 3.2733 - val_accuracy: 0.1176 - val_loss: 2.9204 - learning_rate: 0.0100 Epoch 6/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.1580 - loss: 3.2098 Epoch 6: saving model to model_training_excAdjBrig_incWeiDec/cp-0006.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.1579 - loss: 3.2095 - val_accuracy: 0.1373 - val_loss: 2.6970 - learning_rate: 0.0100 Epoch 7/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.1321 - loss: 3.0187 Epoch 7: saving model to model_training_excAdjBrig_incWeiDec/cp-0007.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.1322 - loss: 3.0181 - val_accuracy: 0.1373 - val_loss: 3.2644 - learning_rate: 0.0100 Epoch 8/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.1752 - loss: 2.9598 Epoch 8: saving model to model_training_excAdjBrig_incWeiDec/cp-0008.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.1753 - loss: 2.9595 - val_accuracy: 0.1373 - val_loss: 2.8043 - learning_rate: 0.0100 Epoch 9/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.1939 - loss: 2.8105 Epoch 9: saving model to model_training_excAdjBrig_incWeiDec/cp-0009.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.1939 - loss: 2.8103 - val_accuracy: 0.1176 - val_loss: 2.9239 - learning_rate: 0.0100 Epoch 10/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.1793 - loss: 2.7810 Epoch 10: saving model to model_training_excAdjBrig_incWeiDec/cp-0010.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.1793 - loss: 2.7813 - val_accuracy: 0.1373 - val_loss: 2.7656 - learning_rate: 0.0100 Epoch 11/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.2412 - loss: 2.7238 Epoch 11: saving model to model_training_excAdjBrig_incWeiDec/cp-0011.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 84ms/step - accuracy: 0.2411 - loss: 2.7236 - val_accuracy: 0.1569 - val_loss: 2.7274 - learning_rate: 0.0100 Epoch 12/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 82ms/step - accuracy: 0.1778 - loss: 2.6660 Epoch 12: saving model to model_training_excAdjBrig_incWeiDec/cp-0012.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 85ms/step - accuracy: 0.1780 - loss: 2.6656 - val_accuracy: 0.1569 - val_loss: 2.6087 - learning_rate: 0.0100 Epoch 13/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.2486 - loss: 2.5211 Epoch 13: saving model to model_training_excAdjBrig_incWeiDec/cp-0013.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.2486 - loss: 2.5210 - val_accuracy: 0.1569 - val_loss: 2.7734 - learning_rate: 0.0100 Epoch 14/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 82ms/step - accuracy: 0.2942 - loss: 2.5266 Epoch 14: saving model to model_training_excAdjBrig_incWeiDec/cp-0014.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 85ms/step - accuracy: 0.2943 - loss: 2.5260 - val_accuracy: 0.2549 - val_loss: 2.7223 - learning_rate: 0.0100 Epoch 15/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 82ms/step - accuracy: 0.3045 - loss: 2.3505 Epoch 15: saving model to model_training_excAdjBrig_incWeiDec/cp-0015.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 85ms/step - accuracy: 0.3045 - loss: 2.3504 - val_accuracy: 0.1373 - val_loss: 2.9778 - learning_rate: 0.0100 Epoch 16/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.2979 - loss: 2.3448 Epoch 16: saving model to model_training_excAdjBrig_incWeiDec/cp-0016.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.2979 - loss: 2.3449 - val_accuracy: 0.1569 - val_loss: 2.8415 - learning_rate: 0.0100 Epoch 17/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 82ms/step - accuracy: 0.3578 - loss: 2.2032 Epoch 17: saving model to model_training_excAdjBrig_incWeiDec/cp-0017.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 85ms/step - accuracy: 0.3578 - loss: 2.2031 - val_accuracy: 0.1176 - val_loss: 3.0142 - learning_rate: 0.0100 Epoch 18/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 83ms/step - accuracy: 0.3532 - loss: 2.1732 Epoch 18: saving model to model_training_excAdjBrig_incWeiDec/cp-0018.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.3533 - loss: 2.1729 - val_accuracy: 0.2157 - val_loss: 2.6681 - learning_rate: 0.0100 Epoch 19/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.3838 - loss: 1.9958 Epoch 19: saving model to model_training_excAdjBrig_incWeiDec/cp-0019.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 87ms/step - accuracy: 0.3839 - loss: 1.9960 - val_accuracy: 0.1765 - val_loss: 2.6428 - learning_rate: 0.0100 Epoch 20/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.3749 - loss: 2.0246 Epoch 20: saving model to model_training_excAdjBrig_incWeiDec/cp-0020.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.3750 - loss: 2.0242 - val_accuracy: 0.1569 - val_loss: 2.8702 - learning_rate: 0.0100 Epoch 21/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.5059 - loss: 1.7430 Epoch 21: saving model to model_training_excAdjBrig_incWeiDec/cp-0021.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 87ms/step - accuracy: 0.5059 - loss: 1.7431 - val_accuracy: 0.1765 - val_loss: 2.6816 - learning_rate: 0.0100 Epoch 22/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.5153 - loss: 1.7076 Epoch 22: saving model to model_training_excAdjBrig_incWeiDec/cp-0022.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.5153 - loss: 1.7076 - val_accuracy: 0.1765 - val_loss: 2.8195 - learning_rate: 0.0100 Epoch 23/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.4757 - loss: 1.4861 Epoch 23: saving model to model_training_excAdjBrig_incWeiDec/cp-0023.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 87ms/step - accuracy: 0.4761 - loss: 1.4853 - val_accuracy: 0.1961 - val_loss: 2.6759 - learning_rate: 1.0000e-03 Epoch 24/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.5953 - loss: 1.2928 Epoch 24: saving model to model_training_excAdjBrig_incWeiDec/cp-0024.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.5955 - loss: 1.2921 - val_accuracy: 0.2157 - val_loss: 2.6260 - learning_rate: 1.0000e-03 Epoch 25/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.6155 - loss: 1.1139 Epoch 25: saving model to model_training_excAdjBrig_incWeiDec/cp-0025.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.6157 - loss: 1.1135 - val_accuracy: 0.2157 - val_loss: 2.6285 - learning_rate: 1.0000e-03 Epoch 26/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.6288 - loss: 1.1189 Epoch 26: saving model to model_training_excAdjBrig_incWeiDec/cp-0026.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.6290 - loss: 1.1184 - val_accuracy: 0.2353 - val_loss: 2.5984 - learning_rate: 1.0000e-03 Epoch 27/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.7032 - loss: 0.9267 Epoch 27: saving model to model_training_excAdjBrig_incWeiDec/cp-0027.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.7034 - loss: 0.9265 - val_accuracy: 0.2549 - val_loss: 2.5893 - learning_rate: 1.0000e-03 Epoch 28/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.7057 - loss: 0.8769 Epoch 28: saving model to model_training_excAdjBrig_incWeiDec/cp-0028.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.7058 - loss: 0.8767 - val_accuracy: 0.2353 - val_loss: 2.5752 - learning_rate: 1.0000e-03 Epoch 29/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.6935 - loss: 0.8896 Epoch 29: saving model to model_training_excAdjBrig_incWeiDec/cp-0029.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 86ms/step - accuracy: 0.6937 - loss: 0.8893 - val_accuracy: 0.2157 - val_loss: 2.6136 - learning_rate: 1.0000e-03 Epoch 30/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.7267 - loss: 0.8104 Epoch 30: saving model to model_training_excAdjBrig_incWeiDec/cp-0030.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 87ms/step - accuracy: 0.7268 - loss: 0.8104 - val_accuracy: 0.2353 - val_loss: 2.5989 - learning_rate: 1.0000e-03 Epoch 31/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 85ms/step - accuracy: 0.7756 - loss: 0.7794 Epoch 31: saving model to model_training_excAdjBrig_incWeiDec/cp-0031.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 88ms/step - accuracy: 0.7756 - loss: 0.7792 - val_accuracy: 0.2745 - val_loss: 2.6105 - learning_rate: 1.0000e-03 Epoch 32/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.7526 - loss: 0.7054 Epoch 32: saving model to model_training_excAdjBrig_incWeiDec/cp-0032.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 20s 87ms/step - accuracy: 0.7527 - loss: 0.7054 - val_accuracy: 0.2353 - val_loss: 2.6656 - learning_rate: 1.0000e-03 Epoch 33/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.7446 - loss: 0.8048 Epoch 33: saving model to model_training_excAdjBrig_incWeiDec/cp-0033.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.7448 - loss: 0.8044 - val_accuracy: 0.2549 - val_loss: 2.5800 - learning_rate: 1.0000e-03 Epoch 34/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.7824 - loss: 0.7112 Epoch 34: saving model to model_training_excAdjBrig_incWeiDec/cp-0034.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.7824 - loss: 0.7109 - val_accuracy: 0.2353 - val_loss: 2.6696 - learning_rate: 1.0000e-03 Epoch 35/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 79ms/step - accuracy: 0.7783 - loss: 0.7268 Epoch 35: saving model to model_training_excAdjBrig_incWeiDec/cp-0035.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 81ms/step - accuracy: 0.7785 - loss: 0.7265 - val_accuracy: 0.2549 - val_loss: 2.6434 - learning_rate: 1.0000e-03 Epoch 36/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 79ms/step - accuracy: 0.8053 - loss: 0.5805 Epoch 36: saving model to model_training_excAdjBrig_incWeiDec/cp-0036.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 81ms/step - accuracy: 0.8054 - loss: 0.5804 - val_accuracy: 0.2353 - val_loss: 2.6038 - learning_rate: 1.0000e-03 Epoch 37/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 78ms/step - accuracy: 0.8208 - loss: 0.5933 Epoch 37: saving model to model_training_excAdjBrig_incWeiDec/cp-0037.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 81ms/step - accuracy: 0.8208 - loss: 0.5933 - val_accuracy: 0.2353 - val_loss: 2.6679 - learning_rate: 1.0000e-03 Epoch 38/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 78ms/step - accuracy: 0.8170 - loss: 0.6470 Epoch 38: saving model to model_training_excAdjBrig_incWeiDec/cp-0038.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 80ms/step - accuracy: 0.8170 - loss: 0.6468 - val_accuracy: 0.2549 - val_loss: 2.6872 - learning_rate: 1.0000e-03 Epoch 39/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 79ms/step - accuracy: 0.8503 - loss: 0.5325 Epoch 39: saving model to model_training_excAdjBrig_incWeiDec/cp-0039.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 82ms/step - accuracy: 0.8503 - loss: 0.5327 - val_accuracy: 0.2353 - val_loss: 2.6939 - learning_rate: 1.0000e-04 Epoch 40/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 78ms/step - accuracy: 0.8397 - loss: 0.5199 Epoch 40: saving model to model_training_excAdjBrig_incWeiDec/cp-0040.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 81ms/step - accuracy: 0.8397 - loss: 0.5198 - val_accuracy: 0.2549 - val_loss: 2.6655 - learning_rate: 1.0000e-04 Epoch 41/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 79ms/step - accuracy: 0.8810 - loss: 0.4415 Epoch 41: saving model to model_training_excAdjBrig_incWeiDec/cp-0041.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 82ms/step - accuracy: 0.8809 - loss: 0.4416 - val_accuracy: 0.2353 - val_loss: 2.6670 - learning_rate: 1.0000e-04 Epoch 42/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.8328 - loss: 0.5510 Epoch 42: saving model to model_training_excAdjBrig_incWeiDec/cp-0042.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 82ms/step - accuracy: 0.8328 - loss: 0.5510 - val_accuracy: 0.2353 - val_loss: 2.6755 - learning_rate: 1.0000e-04 Epoch 43/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 80ms/step - accuracy: 0.8671 - loss: 0.4723 Epoch 43: saving model to model_training_excAdjBrig_incWeiDec/cp-0043.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.8672 - loss: 0.4721 - val_accuracy: 0.2353 - val_loss: 2.6720 - learning_rate: 1.0000e-04 Epoch 44/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8641 - loss: 0.4796 Epoch 44: saving model to model_training_excAdjBrig_incWeiDec/cp-0044.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 84ms/step - accuracy: 0.8642 - loss: 0.4795 - val_accuracy: 0.2549 - val_loss: 2.6680 - learning_rate: 1.0000e-04 Epoch 45/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8254 - loss: 0.5534 Epoch 45: saving model to model_training_excAdjBrig_incWeiDec/cp-0045.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.8255 - loss: 0.5533 - val_accuracy: 0.2549 - val_loss: 2.6844 - learning_rate: 1.0000e-04 Epoch 46/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8888 - loss: 0.4521 Epoch 46: saving model to model_training_excAdjBrig_incWeiDec/cp-0046.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.8888 - loss: 0.4520 - val_accuracy: 0.2549 - val_loss: 2.6989 - learning_rate: 1.0000e-04 Epoch 47/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8674 - loss: 0.4804 Epoch 47: saving model to model_training_excAdjBrig_incWeiDec/cp-0047.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 84ms/step - accuracy: 0.8674 - loss: 0.4803 - val_accuracy: 0.2353 - val_loss: 2.6838 - learning_rate: 1.0000e-04 Epoch 48/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8573 - loss: 0.5079 Epoch 48: saving model to model_training_excAdjBrig_incWeiDec/cp-0048.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 83ms/step - accuracy: 0.8573 - loss: 0.5079 - val_accuracy: 0.2353 - val_loss: 2.6862 - learning_rate: 1.0000e-04 Epoch 49/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 81ms/step - accuracy: 0.8903 - loss: 0.4518 Epoch 49: saving model to model_training_excAdjBrig_incWeiDec/cp-0049.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 84ms/step - accuracy: 0.8903 - loss: 0.4518 - val_accuracy: 0.2353 - val_loss: 2.6988 - learning_rate: 1.0000e-05 Epoch 50/50 231/231 ━━━━━━━━━━━━━━━━━━━━ 0s 82ms/step - accuracy: 0.8244 - loss: 0.5896 Epoch 50: saving model to model_training_excAdjBrig_incWeiDec/cp-0050.weights.h5 231/231 ━━━━━━━━━━━━━━━━━━━━ 19s 84ms/step - accuracy: 0.8244 - loss: 0.5895 - val_accuracy: 0.2353 - val_loss: 2.6801 - learning_rate: 1.0000e-05
# Evaluate the Training model
loss, acc = model.evaluate(normalized_train_df, verbose=2)
print("Trained model, accuracy: {:5.2f}%".format(100 * acc))
231/231 - 4s - 19ms/step - accuracy: 0.8182 - loss: 0.7818 Trained model, accuracy: 81.82%
# Evaluate the Validation model
loss, acc = model.evaluate(normalized_valid_df, verbose=2)
print("Trained model, accuracy: {:5.2f}%".format(100 * acc))
26/26 - 1s - 20ms/step - accuracy: 0.2353 - loss: 2.6801 Trained model, accuracy: 23.53%
These highlight the innovations and architectural choices that have contributed to AlexNet's effectiveness and its pioneering status in the field of deep learning.
[1] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
[2] A. Krizhevsky and G.E. Hinton, "Using very deep autoencoders for content-based image retrieval," in ESANN, 2011.
[3] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?" in International Conference on Computer Vision, IEEE, 2009, pp. 2146–2153.